Query-Aware Locality-Sensitive Hashing for Approximate Nearest Neighbor Search
نویسندگان
چکیده
Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing schemes for the c-Approximate Nearest Neighbor (c-ANN) search problem in high-dimensional Euclidean space. Traditionally, LSH functions are constructed in a query-oblivious manner in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which is undesirable. Due to the use of query-oblivious bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with approximation ratio of integer c ≥ 2. In this paper, we introduce a novel concept of query-aware bucket partition which uses a given query as the “anchor” for bucket partition. Accordingly, a query-aware LSH function is a random projection coupled with query-aware bucket partition, which removes random shift required by traditional query-oblivious LSH functions. Notably, query-aware bucket partition can be easily implemented so that query performance is guaranteed. We propose a novel query-aware LSH scheme named QALSH for c-ANN search over external memory. Our theoretical studies show that QALSH enjoys a guarantee on query quality. The use of queryaware LSH function enables QALSH to work with any approximation ratio c > 1. Extensive experiments show that QALSH outperforms C2LSH and LSB-Forest, especially in high-dimensional space. Specifically, by using a ratio c < 2, QALSH can achieve much better query quality.
منابع مشابه
Approximate Nearest Neighbor Search in ℓp
We present a new locality sensitive hashing (LSH) algorithm for c-approximate nearest neighbor search in `p with 1 < p < 2. For a database of n points in `p, we achieve O(dn) query time and O(dn + n1+ρ) space, where ρ ≤ O((ln c)2/cp). This improves upon the previous best upper bound ρ ≤ 1/c by Datar et al. (SOCG 2004), and is close to the lower bound ρ ≥ 1/c by O’Donnell, Wu and Zhou (ITCS 2011...
متن کاملApproximate Nearest Neighbor Search in $\ell_p$
We present a new locality sensitive hashing (LSH) algorithm for c-approximate nearest neighbor search in lp with 1 < p < 2. For a database of n points in lp, we achieve O(dn ) query time and O(dn + n) space, where ρ ≤ O((ln c)/c). This improves upon the previous best upper bound ρ ≤ 1/c by Datar et al. (SOCG 2004), and is close to the lower bound ρ ≥ 1/c by O’Donnell, Wu and Zhou (ITCS 2011). T...
متن کاملSC-LSH: An Efficient Indexing Method for Approximate Similarity Search in High Dimensional Space
Locality Sensitive Hashing (LSH) is one of the most promising techniques for solving nearest neighbour search problem in high dimensional space. Euclidean LSH is the most popular variation of LSH that has been successfully applied in many multimedia applications. However, the Euclidean LSH presents limitations that affect structure and query performances. The main limitation of the Euclidean LS...
متن کاملEfficient Search in Document Image Collections
This paper presents an efficient indexing and retrieval scheme for searching in document image databases. In many non-European languages, optical character recognizers are not very accurate. Word spotting word image matching may instead be used to retrieve word images in response to a word image query. The approaches used for word spotting so far, dynamic timewarping and/or nearest neighbor sea...
متن کاملLocality-Sensitive Hashing Without False Negatives for l_p
In this paper, we show a construction of locality-sensitive hash functions without false negatives, i.e., which ensure collision for every pair of points within a given radius R in d dimensional space equipped with lp norm when p ∈ [1,∞]. Furthermore, we show how to use these hash functions to solve the c-approximate nearest neighbor search problem without false negatives. Namely, if there is a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 9 شماره
صفحات -
تاریخ انتشار 2015